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^ ' Abstract. Let V be a finite set of n elements and T = {X\ , X2 , • • • , Xm } 

O , a family of m subsets of V. Two sets Xi and Xj of J- overlap if XiflXj 7^ 

0, Xj\Xi ^ 0, and Xi \ Xj ^ 0. Two sets X,Y ^ T are in the same 
overlap class if there is a series X — Xi , X2 , ■ ■ ■ , X^ = y of sets of 
T in which each XiXi+\ overlaps. In this note, we focus on efficiently 
identifying all overlap classes in 0{n + X^i^i l^»l) time. We thus revisit 
the clever algorithm of Dahlhaus T of which we give a clear presentation 
^O ' and that we simplify to make it practical and implementable in its real 

worst case complexity. An useful variant of Dahlhaus's approach is also 
explained. 
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1 Introduction 



> 

^f^ • Let V be a finite set of n = |V| elements and !F = {Xi, X2, . . . , X,n} a family of 

t^^ I m subsets of V. Two sets Xi and Xj of J^ overlap if XiH Xj ^0, Xi \Xj 7^ 0, 

10 ■ and Xj \ Xi 7^ 0. We denote \J^\ as the sum of the sizes of all Xi £ T . We 

'nT \ define the overlap graph OG{!F, E) as the graph with all Xi as vertices and 

E = {(i, j) I Xi overlaps Xj},\l 1 < i,j < m. A connected component of this 
graph is called an overlap class. 

In this note we focus on efficiently identifying all overlap classes of OG{J-, E). 
This problem is a classical one in graph clustering related topics but it also 
appears frequently in many graph problems related to graph decomposition [5] 
or PQ-tree manipulation [3]. 
5^ ■ An efficient ©(n+j.T-'l) time algorithm has already been presented by Dahlhaus 

^ \ ill [2] • The algorithm is very clever but uses an off-line Lowest Common Ancestor 

algorithm (LCA) as subroutine. From a theoretical point of view, off-line LCA 
queries have been proved to be solvable in constant time (after a linear time 
preprocessing) in a RAM model (accepting an additional constant time specific 
register operation) but also recently in a pointer machine model P]. However, 
in practice, it is very difficult to implement these LCA algorithms in their real 
linear complexity. Another difficulty with Dahlhaus's algorithm comes from that 
its original presentation is difficult to follow. These two points motivated this 
note. Dahlhaus's algorithm is really clever and deserves a clear presentation, 
all the more so we show how to replace LCA queries by set partitioning, which 
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makes Dahlhaus's algorithm easily implementable in practice in its real complex- 
ity. We also provide a source code freely available in [J. We eventually explain 
how to simply modify Dahlhaus's approach to efficiently compute a spanning 
tree of each connected component of the overlap graph. This simplifies a graph 
construction in [3]. 

2 Dahlhaus's algorithm 

The overlap graph OG{J-,E) might have 0{m?) edges, which can be quadratic 
in 0(1 J^l). For instance, \i J- = {{xi,X2\, {xi,x^}, . . . , {xi,Xm\}, \E\ = m(TO — 
l)/2 = 6>(to2). 

The approach of Dahlhaus is quite surprising since that, instead of computing 
a subgraph of the overlap graph, Dahlhaus considers a second graph D{!F, L) on 
the same vertex set but with different edges. This graph has however a strong 
property: its connected components are the same than that of OG(T ^ E) , al- 
though that in the general case D{!F, L) is not a subgraph of OG{J-, E). 

Let LF be the list of aW X ^ T sorted in decreasing size order. The ordering 
of sets of equal size is arbitrarily fixed. Given X ^ T, we denote Max(X) as the 
largest Y ^ T taken in LF order such that |F| > \X\ and Y overlaps X. Note 
that Max(X) might be undefined for some sets of T. In this latter case, in order 
to simplify the presentation of some technical points, we write Max(X) == 0. 
Dahlhaus's algorithm is based on the following observation: 

Lemma 1 ([2\). Let X e T such that Max{X) ^ 0. Then for all Y e T such 
that Y n X ^ 9 and \X\ < \Y\ < \Max{X)\, Y overlaps X or Max{X). 

Proof. If Y does not overlap X, as \X\ < |r| and m X 7^ 0, X C Y. Thus 

Y n Max(X) ^ 0. Then, if Y does not overlap Max(X), then Max(X) C Y. 
But in this case, as |r| < |Max(X)|, Y = Max(X) and overlaps X. Therefore Y 
overlaps X or Max(X). D 

Let us assume that we already computed all Max(X). For each w G V we 
compute the list SL{v) of all sets X G J^ to which v belongs. This list is sorted 
in increasing order of the sizes of the sets. Computing and sorting all lists for all 

V &V can be done in 0(|.F|) time using a global bucket sort. 

Dahlhaus's graph D{T, L) is built on those lists. Let X be a set containing v 
such that Max(X) 7^ 0. Then for all consecutive pairs YW after X in SL{v) {X 
included, i.e. Y can be instanced by X) and such that \W\ < |Max(X)|, create 
an edge (Y, W) in the graph D. 

Lemma 2 ([2j). The two graphs D{J-,L) and OG{J-,E) have the same con- 
nected components. 

Proof. (=>) Let Y,W G T such that (Y, W) G L. By construction there exists v 
such that Y and W are consecutive on SL{v) and there exists X that appears 
before YW on SL{v) such that Max(X) ^ and such that \X\ < \Y\ < \W\ < 
|Max(X)|. By lemma [H Y and W overlap either X or Max(X). As X and 



Max(X) overlap, the sets X, F, W, and Max(X) belong to the same overlap 
class of OG{J-, E). By extension, the vertices of any connected path in D{!F, L) 
belong to the same overlap class of OG{J-, E). 

(<^) Let A^B G T he two overlapping sets, i.e. {A, B) £ E. Let v G A Ci 
B. Assume w.l.o.g. that \A\ < \B\. Then Max(A) ^ and |Max(A)| > \B\. 
Therefore, in SL{v), there exits a serie of consecutive pairs YW from A to B 
that are linked in D{T, L). In consequence, A and B are connected in D{!F, L). 
D 

Notice that the order of equally sized sets in 5*^ lists has no importance for 
the construction of a Dahlhaus's graph. Figure [T] shows an example of an overlap 
graph and a Dahlhaus's graph. 
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Fig. 1. Global example: (A) input family of 11 sets; (B) Overlap graph; (C) SL 
lists; (D) Dahlhaus's graph. On (C) intervals defined by Max(A') are overlined. 
Notice that Dahlhaus's graph is not a subgraph of the Overlap graph. 



Lemma 3 ([2]). Given all Max{X),X £ T, the graph D{T,L) can be built in 
0{\J-'\) time and its number of edges is less than or equal to \J-'\. 



Proof. To build the graph D{!F, L) from the SL lists, it suffices to go through 
each SL list from the smallest set to the largest and remenber at each step the 
largest Max(A') already seen. If the size of the current set is smaller than or 
equal to this value, an edge is created between the last two sets considered. 

Let us now consider the number of edges of D{T, L). As at most one edge is 
created for each set in a list SL, at most \T\ edges are created after processing 
all lists. □ 



Identifying the overlap classes of OG{J-, E) can therefore be done by a simple 
Depth First Search on D{J^, L) in 0{n+ \J^\) time. It remains however to explain 
how to efficiently compute all Max(X). 

3 Computing all Max(X) 



Let LF be the list of all X e JT sorted in decreasing size order. The order of sets 
of equal size is not important. We consider a boolean matrix BM of size \T\ 'x\V\ 
such that each row represents a set X ^ J- in the order of LF, and each column 
an element w G V. The value BM[i, j] is 1 if and only if Vj G Xi. 

The first step of Dahlhaus's algorithm is to sort the columns of BM in lexico- 
graphical order, although that there is no detail in [5] on how to do it efficiently 
in OdJ-"!) time. We postpone all explanations concerning this step to section 
13.21 and we consider below that all columns of BM are lexicographically sorted. 
Figure [2] shows the BM matrix for the set family of Figure [1] 
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Fig. 2. Example continued: BM matrix which lines are sorted by decreasing 
sizes oi X £ J- and which columns are sorted in lexicographic order. 



For each X G J^ we denote lcft(X) (resp. right(X)) the number of the column 
of BM containing the leftmost (resp. rightmost) 1 in the row of X. 

Lemma 4. Let X,Y G J- such that Y overlaps X and let ry be the row of 

Y in BM. Then there exists a row t higher than or equal to ry such that 
BM[t, left(X)] = and BM[t, nght{X)] = 1. 

Proof. As Y overlaps X, \X\ > 2. Let rx be the row corresponding to X in 
BM. Since Y overlaps X, there exist two indices I < i < j < \V\ and a row r 
such that BM[rx, i] = BM[rx,j] = 1, such that one of the value of BM[r, i] and 
BM[r, j] is 1 and the other 0. 

We consider the highest r that satisfies these conditions. 

In a first step, if BM[r, i] = 1 and BM[r, j] = 0, then, as i < j and as all 
columns has been sorted in increasing lexicographical order, there must exist a 



row r' higher than r such that BM[r', i] — and BM[r', j] = 1. We thus consider 
now w.Lo.g that BM[r, z] = and BM[r, j] = 1. 

Among all pairs of indices i and j such that BM[rx,«] — BM[rx, j] = 1 and 
that there exits r such that BM[r, i] ~ and BM[r, j] — 1, let us consider one 
pair i' and j', 1 < i' < j' < \V\, that is associated to the highest such r that we 
denote t. 

We now prove that BM[<,left(X)] = and BM[i, right (X)] = 1. If BM[t,left(X)] 
= 1, thus i > left(X) and as BM[i, i] — and that the columns are sorted in lexi- 
cographical order, there should exits an higher row r' such that BM[r', left(X)] = 

and BM[r',i] = 1, which contradicts t to be the highest such row. Thus 
BM[i,left(X)] = 0. Symmetrically, the same argument holds to prove that 
BM[i,right(X)] = 1. D 

Lemma 5. Let X ^ J-. Then Max{X) ^ 9 if and only if there exists a row t 
in BM such that BM[t, left{X)\ — and BM[t, right{X)] = 1 corresponding to a 
set Y £ J- verifying \Y\ > \X\. 

Proof. (^) If a set Y corresponds to a row t in BM such that BM[i, left(X)] = 
and BM[i,right(X)] ^ 1, Y obviously overlaps X. As |r| > \X\, Max(X) ^ 0. 
(=►) Let us assume that Max(X) 7^ and let r^f be its row in BM. Then, 
by lemma SI there exists a row t in BM such that BM[t, left(Ar)] = and 
BM[i,right(X)] = 1 and such that t is higher than or equal to tm- As Max(X) 
verifies |Max(Ar)| > \X\, the set Y corresponding to rM is also such that |y| > 
\X\. D 

Lemma 6 ([2]). Let X E T such that Max{X) / 0. Then Max{X) corresponds 
to the highest row t in BM such that BM[t, left{X)] = and BM[t, right{X)] = 1. 

[Notice that this row might be lower than the row corresponding to X. This is 
the case for Xs and Xio since Max(Xio) = Xs but also Max(Ar8) = -'fio- in our 
example.] 

Proof. Let us assume that Max(X) 7^ and let tm be its row in BM. Then, 
by lemma m there exists a row t in BM such that BM[t,left(X)] = and 
BM[i,right(X)] = 1 and such that t is higher than or equal to tm- However, as 
such a row t corresponds to a set overlapping X and that Max(X) is the largest 
of those sets in LF order, t = rM. n 

For example, in Figure[2l Max(Xi) = Xg since left(Xi) = 1, right(Xi) — 6 
and Xg (row 2) corresponds to the highest row with on the first column and 

1 on the 6*^. 

Dahlhaus's approach for computing all Max(X) is to identify for each row 
r corresponding to X the highest row t such that BM[i,left(X)] = and 
BM[i,right(X)] = 1. To do it efficiently, Dahlhaus reduces the problem to LCA 
computations. We explain this reduction in the next section lSTTI We then present 
another approach using class partitions in 13.21 This new approach is much sim- 
pler to implement than the LCA algorithm in its real linear worst case complex- 
ity. Moreover, it allows an easy computation of the lexicographical order of the 
columns. 



3.1 Computing all Max(X) using LCA 

Let us consider all intermediate columns between all pairs of columns in BM. 
In those columns, for each row, we place a point • between each motif 01 or 10. 
This is shown in Figured (left). We link the highest point in each intermediate 
column, if it exist, in a Dahlhaus's tree (DT) the following way: 

1. the root of the tree is the highest point. There can be only one root and 
there must be one root if one of the set X e JF differs from V. We assume 
this below; 

2. we recurse the following process: each new point np in the tree (root included) 
splits the submatrice in two subparts according to the intermediate column 
it is placed in; the left (resp. right) child of np is the highest point in the left 
(right) part, if it exits. Note that the lexicographical order of the columns 
of BM insures that there can be at most one highest point in each part; 

3. when a subpart does not contain any new point, a leaf per BM column in 
this subpart is created and attached as child to the point that created the 
subpart. If this point is placed to the left (resp. right) of this column, the 
child is a right (resp. left) child. Each leaf is numbered with the number of 
the corresponding column in BM. 

An instance of such a tree is given in Figure [3] (right) . 
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Fig. 3. Example continued: Dahlhaus's tree built over a BAI matrix. 



Proposition 1 ([2j)- Let X E T. Let Y E IF he the set corresponding to the row 
of LCA{left{X), right{X)) m BM. If \Y\ > \X\, then Y = Max(X). Otherwise 
Max(X) = 0. 

Proof. Let r be the number of the row of LCA{leit{X), right(X)) in BM and let 
I be the position of the column in BM that is just before the point representing 
LCA{Mt{X), right(X)). 



First, BM[r, I] = Q and BAI[r,l + l] = 1. Suppose a contrario that BM[r, I] = 
1 and BAI[r, ^ + 1] = 0. As all columns of BM are sorted in lexicographical order, 
there must exists an higher row r' such that BAI[r, I] = and BM[r, l + l] = 1. 
and thus a point in the intermediate column between Z an ^ + 1 higher than that 
in row r, which contradicts the construction of DT. 

We now prove that BM[r,\eit{X)] = and BM[r, right(X)] = 1. A con- 
trario, suppose that i?Af [r, left(A")] = 1. Then, again, as the columns of BM 
are sorted in lexicographical order, there must exists an higher row r' such that 
i?M[r', left(A')] = and BM[r' ,1] = 1. This again contradicts the construction 
of DT. A similar argument holds for the right side. 

We then prove that r is the highest row with this property. Assume a con- 
trario that there exist an higher row r' such that BM[r' ,\eii{X)] ~ and 
-BM[r', right(X)] = 1. Then there would have been a split 01 somewhere in this 
row that would have separated left(X) and right(X). This implies that there 
would have been a node in DT in a row higher than or equal to r' that would 
have split left(X) and right(X), which contradicts r to be the number of the row 
ofLCA(left(X), right(X)). 

If 1^1 ^ |-^|j by Lemma[5]Max(X) 7^ and the set Y that corresponds to r 
is such that Y = Max(X). 

If \Y\ < \X\, since no row r' higher than r can verify BM[r' ,\eit{X)] = 
and BM[r', rightiX)] = 1, by Lemma E] Max (X) ^ 0. □ 

For example, Xg corresponds to the row of LCyl(l, 2) = LC^(left(Xii), right(Xii)). 
As [Xgl > |A:|, ATg = Max(Xii). 

3.2 Computing all M£ix(X) using set partitioning 

We present below an alternative approach that permits avoiding LCA queries. 
Moreover, the lexicographical column order appears as a by-product. 

We manipulate sorted partitions of V that we refine by each X ^ T taken in 
LF order, that is, in decreasing order of their sizes. The initial partition is the 
whole set V and denoted Py- For clarity, a set in a partition is called a part. In 
each partition the order of the parts is important, but the order of elements in 
a same part is not. Let C = {vi, . . . , t;^} be a part in a partition. Refining C by 
X ^ T consists in extracting all w^ G X in C and create a new part C" with 
all those Vi. The remaining Vi ^ X 'm C form a new part C" and C is replaced 
in the current partition by C'C" . If C only contains elements of X as well as 
if it contains none, C remains unchanged in the partition. Refining a partition 
P by a set X ^ T consists in refining successively all parts in P . We note this 
refinement P\x- 

For example (continued), if P = {a}{i, j, /c, ^}{6}{c, d}{e, /, g, ft-} and X = 
X, = {d,e}, P\x = {a}{i,j,k,l}{b}{c}{d}{f,g,h}{e}. 

Our approach requires 3 steps: 
1. refine Py by all X G JF taken in LF order; 



2. then compute for each X <^ T the values of left(X) and right(X) and sort 
aU X G JF in a special order in regard with these values; 

3. eventually refine Py again by all X G ^ taken in LF order but using the 
informations computed in step 2 to compute all Max(X). 

We detail below each step. 

Step 1 - Refining Py . Let us consider the final partition we obtain after 
refining Py by each X ^ T taken in LF order. We note this partition Pf. 

Lemma 7. The elements of Pf are sorted accordingly to the lexicographical or- 
der of the columns of BM. 

Proof. Refining a partition consists in lexicographically sorting a row of BM 
touching only the 1 in the row but also keeping the global order already defined 
by the sets in the partition. Thus refining partitions from Py in LF order consists 
in lexicographically ordering BAI from the top row to the bottom. D 

For example (continued), on the data in Figure [I] Pf — {a,}{i}{l}{j}{k}{b} 
{c}{d}{h}{f, g}{e}. Note that equal columns of BAI are in the same part of Pf 
on which we fix an arbitrary order. 

Step 2 - Computing all left{X) and right{X) values. We then compute 
all \eit{X) and right(X) values on Pf. This can be done easily in 0(|^| + n) 
time by scanning each X G J- and keeping the minimum and maximum position 
of one of its element in Pf. We also compute a data structure AM that for each 
position 1 < i < |V^| of Pf gives a list of all X e JT such that i = right(X). All 
those lists are sorted in increasing order of left(X). The structure also allows an 
element X G JF to be removed from the list AAf [right(X)] in 0(1) time. This 
can be insured for instance using doubly linked list to implement each list, and 
the whole structure can easily be built in 0{n + m) time using bucket sorting. 

Step 3 - Refining Py again and identifying all Max{X). The main idea 
is the following. Assume that at a step of the refinement process in LF order we 
refine a part C = {wi, . . . ,Vk\ of a partition P hy Y ^ J- and that it results two 
non empty parts G'C" . 

Lemma 8. Let X ^ T such that \X\ < \Y\, left{X) e C" and right{X) e C" . 
Then Y = Max{X). 

[Note that if |X| = |y| then X could be before Y in LF order.] 
Proof. Let r be the row corresponding to Y in BM. As left(Ar) G C" and 
right(X) e C", then BM[r,left(X)] = and BM[r, right(X)] = 1, and Y ob- 
viously overlaps X. As \X\ < \Y\, Max(X) 7^ 0. Moreover, the row r is the 
highest such that -BM[r, left(X)] = and -BM[r, right(X)] = 1 since otherwise 
the elements of X would have been split by a set bigger that Y in the LF order. 
Thus, by Lemma H Y = Max(X). D 



The last phase of the algorithm thus consists in refining Py again by all y G ^ 
taken in LF order. We first initialize all values Max(X) to 0. Each time a new 
split C'C" appears (say between positions / and I + f), for all v € C" all lists 
ylM[t!] are inspected the following way: let X be the top of one of those the 
list; while left(X) <l, X \s popped off the list and Max(X) ^ Y . After having 
refined with F, if there is no more Y' <lf Y such that \Y'\ = \Y\, all sets of 
the same size than Y are removed from the AM structure. 

Lemma 9. Our algorithm correctly computes in 3 steps all Max{X), X £ J- . 

Proof. In step 1 the lexicographical order of the columns of BM is computed 
as a partition Pf (Lemma [T])- In step 2 all values left(X) and right(X), X G 
J^, are computed and the AM structure is built. In step 3, the correctness of 
the computation relies on the following observation: for each new partition P 
created after a refinement, all sets X remaining in AM are such that left(X) and 
right(X) belong to the same part in P. This is obviously true since otherwise 
they would have been split by a previous refinement and removed of AM . This 
has for consequence that after a split of a set C in C'C" by a set Y, testing 
if left(X) e C" and right(X) e C" for aU sets in AM is equivalent to test if 
right(X) e C" and left(X) < /, where I is the left position in P of the split 
between C and C" . Moreover, as each set taken in LF order and used for a 
possible refinement is removed of AM after having processed all the sets of the 
same size, when a set Y splits a part C in CC" , all sets in AM are such that 
\X\ < \Y\. We thus fulfill all requirements of Lemma [8] and Y = Max(X). Thus, 
if a value Max(X) is assigned by our algorithm, it is assigned with the right one. 
Now, suppose that a set X admits a set Y as Max(X). It is guaranteed that 
a certain step of the algorithm Y has been assigned to Max(X) since that by 
definition \X\ < \Y\ which implies that X is still in AM when Y is processed 
and that by Lemma [H left (X) ^ Y and right(X) e Y. The set Y has thus split 
a part C in a partition in C'C" such that right(X) > I and left(X) < I where / 
is the left position in P of the split between C and C" . □ 

It remains to explain how a partition refinement can be efficiently implemented. 
We exploit the fact that element's order inside each part of a partition has no 
importance to obtain a simple implementation: a partition is represented as a 
table of size n in which each cell contains (a) an element of V and (b) a pointer 
to the part of the partition in which it is contained. A part is represented by a 
pair of its bounds on this table. Figure 0] shows such an implementation. 
Refining a partition P by a set Y can be done in 0(|y|) the following way. Let 
[i,j] be the bounds of a part C such that C <;tY (easily testable). Let k be the 
number of elements of Y that belongs to the subtable [i,j], I < k < j — i. We 
swap elements in the subtable [i,j] to place all k elements belonging to Y at the 
end of this subtable. We then adjust the bounds of C to [i,j — k] and create a 
new set [j — k + l,j] on which the k elements of Y now point. 

Theorem 1. The identification of all Max{X), X G J-, using partition refine- 
ment can he done in 0{n + |^|) time. 
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Fig. 4. Example continued: implementation of P 

{a}{i,j, k, l}{b}{c, d}{e, f, g, h}. 



Proof. By LemmalHlthe algorithm is correct. Steps 1 and 2 are OdTl+n) time. In 
step 3, the fact that all lists in AM are sorted in increasing order of leftQ values 
insures that when a set Y splits a part C in C'C" , identifying and popping off all 
sets X such that left(X) e C and right(X) e C" can be done in 9{\C\ +K + 1) 
time, where K is the number of such sets. Removing a set out of AAI is 0(1) 
time, thus the total of time managing AM is 6>(|jr| + n) time. □ 

The whole algorithm has been implemented in its real worst case time com- 
plexity and is freely available in ^. 



4 Computing a subgraph of the overlap graph 

In some applications like in [3.^ it is useful to get a spanning tree of all overlap 
classes of OG{J-,E). The approach of [3J is to first compute Dahlhaus's graph 
and then compute spanning trees of the connected components of the overlap 
graph using a quite complex add-on. We thus explain in this section how to 
simply modify Dahlhaus's approach to compute a subgraph of the overlap graph 
instead of D{T, L). The size of the subgraph is linear but it has the same con- 
nected components than the overlap graph and it is thus easy from it to compute 
spanning trees of the overlap graph. The idea of the modification is the following. 

Lemma 10. Let X,Y E T such that XDY ^ 0, such that Max{X) ^ and such 
that \X\ < \Y\ < \Max{X)\. Let ry be the row ofY in BM. IfBM[rY, left{X)] = 
0, Y overlaps X. Otherwise, (a) if BM[rY, right{X)] = 0, then Y overlaps X, 
and (h) if BM[rY,right{X)] = 1, then Y overlaps Max{X). 

Proof Let rx be the row of X in BM, and r that of Max(X). If BM[rY,left{X)] = 
0, as BM[rxJeft{X)] = 1, that X n F / and that \X\ < \Y\, Y overlaps X. 
Assume now that BM[rY,left{X)] = 1. Case (a): if BA/[ry , right(X)] = 0, then, 
as BM[rx,right{X)] = 1, with the same arguments that above Y overlaps X. 
Case (b): if BM[ry, right(X)] = 1, then, as by LemmaEl BM[r,Tight{X)] = 1 
and BM[r,Mt{X)] = 0, and that |r| < \Ma.x{X)\, Y overlaps Max(X). D 

We modify the construction of Dahlhaus's graph the following way. We 
still consider intervals X..YW.. on SL{v) lists such that Max(X) ^ and 
\W\ < |Max(X)|, but instead of creating a chain X-..r-l^ -.. mD{T,L), we 
create an edge {X, Max(X)) (if it does not already exists) and a list of quintu- 
ples (left (X), right (X),X,y,Max(X)), (left(X),right(X), X, W^,Max(X)), .. for 
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Fig. 5. Global example (continued): (A) input family of 11 sets; (B) LQj^ and 
LQ2 lists in which right(X) and left(X) heve been replaced by Pfrisht{x) and 
Pfic!t{x) ; (C) SL lists; (D) the resulting subgraph of the overlap graph. 



all the elements in the interval distinct of X and Max(X). All quintuples for 
all intervals are placed in the same list LQi. Note that if an element belongs to 
2 intervals, a unique quintuple is formed with the rightest interval. 

To apply Lemma [TUl if suffices for each (left(X),right(X),X,y,Max(X)) 
to test if Y belongs to 5'i(P/iett(x))- If not, we then create an edge {X,Y). 
Otherwise, we test if Y belongs to SL{Pf^^^^^(^x))- If not, we also create an edge 
{X, Y). However, if it does, we create an edge (F, Max(X)). 

For complexity issues we need to perform those tests at a glance for all 
quintuples in LQi. We do it in two phases. In the first phase we search for all 
Y in S'L(P/i„ft(x)). If Y does not belong to S'L(P/irft(x)), we add the quintuplet 
(left(X),right(Ar),X, F, Max(X)) to a second list LQ2- In the second phase, if 
LQ2 is not empty, for all (left(X),right(X), A", F, Max(X)) in LQ2 we search Y 
in SL(P/,,,,<x)). 

We assume below that all SL{v) lists are sorted accordingly to the LF order 
instead of being simply sorted by increasing sizes. To efficiently compare LQi 
with all SL{v) lists it suffices to sort the list LQi accordingly to left(X) and then 
sort all quintuples with the same left(X) value in the LF order of Y. This can be 
done in 0{n + \!F\) time using bucket sorting. The comparison of LQi and the 
tables SL{) can then be done in 0{n + \T\) time by comparing simutaneously 
\V\ sorted lists. The same approach holds for LQ2- We thus have: 

Theorem 2. A subgraph of the overlap graph oj T having the same connected 
components can be computed in 0{n + \J-\) time. 

Proof. Lemma [TUl insures that the new graph is a subgraph of the overlap graph. 
To prove that they have the same connected component, it thus suffices to prove 
that if two sets A and B overlap, there exists a path connecting A and B in 



the subgraph. The following observation is the base of the proof: let X..Y..Z 
sorted by increasing size on the same SL{v) and such that \Y\ < |Max(X)|, 
|Max(X)| < |Max(y)| and \Z\ < |Max(r)|. Then there exists a path between 
all sets X, Y, Z in the new subgraph since by construction X and Max(X) are 
connected, Y is connected to X or Max(X), Y is connected to Max(y) and 
eventually Z is connected to Y or Max(y). 

Now let V e An B. Assume w.l.o.g. that \A\ < \B\. Then Max(A) ^ and 
|Max(A)| > \B\. Therefore, in SL{v), there exits a series (potentially empty) 
of k sets A..Yi..Y2..Yk..B such that \B\ < |Max(yfc)|, \Yk\ < |Max(Yfc_i)|, and 
1^1 1 < |Max(A)|. By induction on the series using the previous observation there 
exits a path from ^ to _B in the subgraph. 

The subgraph can obviouly been built in 0{n + |JF|) time since all steps can 
be done in this time. □ 

An example (continued) of the resulting subgraph is shown in Figure [5j 
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